Goto

Collaborating Authors

 Liquified Gas


Inside the Chornobyl exclusion zone – in pictures

The Guardian > Energy

A Russian drone attack has inflicted tens of millions of pounds of damage to the site of the Chornobyl nuclear power plant, according to experts. The photographer Julia Kochetova has gained access to the area


Using Noise to Infer Aspects of Simplicity Without Learning

Neural Information Processing Systems

Noise in data significantly influences decision-making in the data science process. In fact, it has been shown that noise in data generation processes leads practitioners to find simpler models. However, an open question still remains: what is the degree of model simplification we can expect under different noise levels? In this work, we address this question by investigating the relationship between the amount of noise and model simplicity across various hypothesis spaces, focusing on decision trees and linear models. We formally show that noise acts as an implicit regularizer for several different noise models. Furthermore, we prove that Rashomon sets (sets of near-optimal models) constructed with noisy data tend to contain simpler models than corresponding Rashomon sets with nonnoisy data. Additionally, we show that noise expands the set of "good" features and consequently enlarges the set of models that use at least one good feature. Our work offers theoretical guarantees and practical insights for practitioners and policymakers on whether simple-yet-accurate machine learning models are likely to exist, based on knowledge of noise levels in the data generation process.


A Detailed architecture

Neural Information Processing Systems

Similar to the eDAFNO architecture shown in (6), we present the iDAFNO version by incorporating the layer-independent parameter definition characterized in the IFNO structure (You et al., 2022c): ( J [h](x):=h(x) + τσ χ(x) (I(χ()h(); v) h(x)I(χ(); v) + W h(x) + c)), where I(; v):= F Here, τ = 1 is the reciprocal of the total number of layers employed. Note that the superscript l is L dropped because the model parameters are layer-independent in the iDAFNO architecture, which leads to significant computational saving. A total of 3 Fourier layers with 32 Fourier modes in each direction are employed. The parameter of each method is given in the following, where the parameter choice of each model is selected by tuning the number of layers and the width (channel dimension) keeping the total number of parameters on the same magnitude. To perform fair comparison with the results reported in Li et al. (2022a), we employ the same hyperparameters here: in particular, four Fourier layers with mode 12 and width 32 are used.


GFT: Graph Foundation Model with Transferable Tree Vocabulary

Neural Information Processing Systems

Inspired by the success of foundation models in applications such as ChatGPT, as graph data has been ubiquitous, one can envision the far-reaching impacts that can be brought by Graph Foundation Models (GFMs) with broader applications in the areas such as scientific research, social network analysis, drug discovery, and e-commerce. Despite the significant progress of pre-trained graph neural networks, there haven't been GFMs that can achieve desired performance on various graph-learning-related tasks. Building GFMs may rely on a vocabulary that encodes transferable patterns shared among different tasks and domains. Unlike image and text, defining such transferable patterns for graphs remains an open question. In this paper, we aim to bridge this gap by rethinking the transferable patterns on graphs as computation trees - i.e., tree structures derived from the message-passing process. Based on this insight, we propose a cross-task, crossdomain graph foundation model named GFT, short for Graph Foundation model with transferable Tree vocabulary. By treating computation trees as tokens within the transferable vocabulary, GFT improves model generalization and reduces the risk of negative transfer. The theoretical analyses and extensive experimental studies have demonstrated the transferability of computation trees and shown the effectiveness of GFT across diverse tasks and domains in graph learning.


A Additional Ablation Studies

Neural Information Processing Systems

In this section, we provide three additional ablation studies and discussions to further analyze our proposed method. These ablation studies are conducted on the iWildCam dataset. A.1 Aggregator Methods In Table 9, we include several hand-designed aggregation operators: max-pooling, average-pooling, and two MLP-based learnable architectures. The two MLP-based learnable architectures work as follows. MLP weighted sum (MLP-WS) takes the output features from the MoE models as input and produces the score for each expert. Then, we weigh those output features using the scores and sum them to obtain the final output for knowledge distillation.


Can Models Learn Skill Composition from Examples?

Neural Information Processing Systems

As large language models (LLMs) become increasingly advanced, their ability to exhibit compositional generalization--the capacity to combine learned skills in novel ways not encountered during training--has garnered significant attention. This type of generalization, particularly in scenarios beyond training data, is also of great interest in the study of AI safety and alignment.




Gradient Rewiring for Editable Graph Neural Network Training

Neural Information Processing Systems

Deep neural networks are ubiquitously adopted in many applications, such as computer vision, natural language processing, and graph analytics. However, welltrained neural networks can make prediction errors after deployment as the world changes. Model editing involves updating the base model to correct prediction errors with less accessible training data and computational resources. Despite recent advances in model editors in computer vision and natural language processing, editable training in graph neural networks (GNNs) is rarely explored. The challenge with editable GNN training lies in the inherent information aggregation across neighbors, which can lead model editors to affect the predictions of other nodes unintentionally.


A Task Setups Table 4: Shared hyperparameters for all models, given for each task

Neural Information Processing Systems

Table 4: Shared hyperparameters for all models, given for each task. We provide the hyperparameter setups shared across our models for each task in Table 4. In addition, the hyperparameters tuned for each model for the best performance are shown in Table 5, which were selected using validation performance. We also provide a textual description of some aspects of the base models below. Random Walk We train 4-layer models with a hidden size of 256 and 4 attention heads.